The Issues with Non-Generic Collections

When the .NET platform was first released, programmers frequently used the System.Collections namespace of mscorlib.dll. Here, developers were provided with a set of classes that allowed them to manage and organize large amounts of data. Table 10-1 documents some of the more commonly used collection classes, and the core interfaces they implement.

Table 10-1. Commonly used Classes of System.Collections

System.Collections Class	Meaning in Life	Key Implemented Interfaces
ArrayList	Represents a dynamically sized collection of objects listed in sequential order.	IList, ICollection, IEnumerable, and ICloneable
Hashtable	Represents a collection of key/value pairs that are organized based on the hash code of the key.	IDictionary, ICollection, IEnumerable, and ICloneable
Queue	Represents a standard first-in, firstout (FIFO) queue.	ICollection, IEnumerable, and ICloneable
SortedList	Represents a collection of key/value pairs that are sorted by the keys and are accessible by key and by index.	IDictionary, ICollection, IEnumerable, and ICloneable
Stack	A last-in, first-out (LIFO) stack providing push and pop (and peek) functionality.	ICollection, IEnumerable, and ICloneable

The interfaces implemented by these classic collection classes provide huge insights into their overall functionality. Table 10-2 documents the overall nature of these key interfaces, some of which you worked with first-hand in Chapter 9.

Table 10-2. Key Interfaces Supported by Classes of System.Collections

System.Collections Interface	Meaning in Life
ICollection	Defines general characteristics (e.g., size, enumeration, and thread safety) for all non-generic collection types.
ICloneable	Allows the implementing object to return a copy of itself to the caller.
IDictionary	Allows a non-generic collection object to represent its contents using key/value pairs.
IEnumerable	Returns an object implementing the IEnumerator interface (see next table entry).
IEnumerator	Enables foreach style iteration of collection items.
IList	Provides behavior to add, remove, and index items in a sequential list of objects.

In addition to these core classes (and interfaces), the System.Collections.Specialized namespace of System.dll added a few (pardon the redundancy) specialized collection types such as BitVector32, ListDictionary, StringDictionary, and StringCollection. This namespace also contains many additional interfaces and abstract base classes that you can use as a starting point for creating custom collection classes.

While it is true that many successful .NET applications have been built over the years using these "classic" collection classes (and interfaces), history has shown that use of these types can result in a number of issues.

The first issue is that using the System.Collections and System.Collections.Specialized classes can result in some poorly performing code, especially when you are manipulating data structures (e.g., value types). As you’ll see momentarily, the CLR must perform a number of memory transfer operations when you store structures in a classic collection class, which can hurt runtime execution speed.

The second issue is that these classic collection classes are not type safe because they were (by-andlarge) developed to operate on System.Objects, and they could therefore contain anything at all. If a .NET developer needed to create a highly type safe collection (e.g., a container that can only hold objects implementing a certain interface), the only real choice was to create a brand new collection class by hand. Doing so was not too labor intensive, but it was a tad bit on the tedious side.

Given these (and other) issues, .NET 2.0 introduced a brand new set of collection classes, which are packaged up in the System.Collections.Generic namespace. Any new project created with .NET 2.0 and higher should ignore the legacy, non-generic classes in favor of the corresponding generic classes.

Note This is worth repeating: Any .NET application built with .NET 2.0 or higher should ignore the classes in System.Collections in favor of the classes in System.Collections.Generic.

Before you look at how to use generics in your programs, you’ll find it helpful to examine the issues of non-generic collection classes a bit closer; this will help you understand better the problems generics intend to solve in the first place. If you wish to follow along, create a new Console Application named IssuesWithNon-genericCollections. Next, import the System.Collections namespace to the top of your C# code file:

using System.Collections;

The Issue of Performance

As you might recall from Chapter 4, the .NET platform supports two broad categories of data: value types and reference types. Given that .NET defines two major categories of types, you might occasionally need to represent a variable of one category as a variable of the other category. To do so, C# provides a simple mechanism, termed boxing, to store the data in a value type within a reference variable. Assume that you have created a local variable of type int in a method called SimpleBoxUnboxOperation():

static void SimpleBoxUnboxOperation()
{
    // Make a ValueType (int) variable.
    int myInt = 25;
}

If, during the course of your application, you were to represent this value type as a reference type, you would box the value, as follows:

private static void SimpleBoxUnboxOperation()
{
    // Make a ValueType (int) variable.
    int myInt = 25;

    // Box the int into an object reference.
    object boxedInt = myInt;
}

Boxing can be formally defined as the process of explicitly assigning a value type to a System.Object variable. When you box a value, the CLR allocates a new object on the heap and copies the value type’s value (25, in this case) into that instance. What is returned to you is a reference to the newly allocated heap-based object. If you use this technique, you don’t need to use of a set of wrapper classes to treat stack data temporarily as heap-allocated objects.

The opposite operation is also permitted through unboxing. Unboxing is the process of converting the value held in the object reference back into a corresponding value type on the stack. Syntactically speaking, an unboxing operation looks like a normal casting operation. However, the semantics are quite different. The CLR begins by verifying that the receiving data type is equivalent to the boxed type; and if so, it copies the value back into a local stack-based variable. For example, the following unboxing operations work successfully, given that the underlying type of the boxedInt is indeed an int:

private static void SimpleBoxUnboxOperation()
{
    // Make a ValueType (int) variable.
    int myInt = 25;

    // Box the int into an object reference.
    object boxedInt = myInt;

    // Unbox the reference back into a corresponding int.
    int unboxedInt = (int)boxedInt;
}

When the C# compiler encounters boxing/unboxing syntax, it emits CIL code that contains the box/unbox op codes. If you were to examine your compiled assembly using ildasm.exe, you would find the following:

.method private hidebysig static void SimpleBoxUnboxOperation() cil managed
{
 // Code size 19 (0x13)
 .maxstack 1
 .locals init ([0] int32 myInt, [1] object boxedInt, [2] int32 unboxedInt)
 IL_0000: nop
 IL_0001: ldc.i4.s 25
 IL_0003: stloc.0
 IL_0004: ldloc.0
 IL_0005: box [mscorlib]System.Int32
 IL_000a: stloc.1
 IL_000b: ldloc.1
 IL_000c: unbox.any [mscorlib]System.Int32
 IL_0011: stloc.2
 IL_0012: ret
} // end of method Program::SimpleBoxUnboxOperation

Remember that unlike when performing a typical cast, you must unbox into an appropriate data type. If you attempt to unbox a piece of data into the incorrect variable, an InvalidCastException exception will be thrown. To be perfectly safe, you should wrap each unboxing operation in try/catch logic; however, this would be quite labor intensive to do for every unboxing operation. Consider the following code update, which will throw an error because you’re attempting to unbox the boxed int into a long:

private static void SimpleBoxUnboxOperation()
{
    // Make a ValueType (int) variable.
    int myInt = 25;

        // Box the int into an object reference.
    object boxedInt = myInt;

    // Unbox in the wrong data type to trigger
    // runtime exception.
    try
    {
        long unboxedInt = (long)boxedInt;
    }
    catch (InvalidCastException ex)
    {
        Console.WriteLine(ex.Message);
    }
}

At first glance, boxing/unboxing might seem like a rather uneventful language feature that is more academic than practical. After all, You will seldom store a local value type in a local object variable, as seen here. However, it turns out that the boxing/unboxing process is quite helpful because it allows you to assume everything can be treated as a System.Object, while the CLR takes care of the memory-related details on your behalf.

Let’s look at a practical use of these techniques. Assume you have created a non-generic System.Collections.ArrayList to hold onto a batch of numeric (stack-allocated) data. If you were to examine the members of ArrayList, you would find they are prototyped to operate on System.Object data. Now consider the Add(), Insert(), Remove() methods, as well as the class indexer:

public class ArrayList : object,
    IList, ICollection, IEnumerable, ICloneable
{
...
    public virtual int Add(object value);
    public virtual void Insert(int index, object value);
    public virtual void Remove(object obj);
    public virtual object this[int index] {get; set; }
}

ArrayList has been built to operate on objects, which represent data allocated on the heap, so it might seem strange that the following code compiles and executes without throwing an error:

static void WorkWithArrayList()
{
    // Value types are automatically boxed when
    // passed to a method requesting an object.
    ArrayList myInts = new ArrayList();
    myInts.Add(10);
    myInts.Add(20);
    myInts.Add(35);
    Console.ReadLine();
}

Although you pass in numerical data directly into methods requiring an object, the runtime automatically boxes the stack-based data on your behalf.

Later, if you wish to retrieve an item from the ArrayList using the type indexer, you must unbox the heap-allocated object into a stack-allocated integer using a casting operation. Remember that the indexer of the ArrayList is returning System.Objects, not System.Int32s:

static void WorkWithArrayList()
{
    // Value types are automatically boxed when
    // passed to a member requesting an object.
    ArrayList myInts = new ArrayList();
    myInts.Add(10);
    myInts.Add(20);
    myInts.Add(35);

    // Unboxing occures when a object is converted back to
    // stack based data.
    int i = (int)myInts[0];

    // Now it is reboxed, as WriteLine() requires object types!
    Console.WriteLine("Value of your int: {0}", i);
    Console.ReadLine();
}

Again, note that the stack-allocated System.Int32 is boxed prior to the call to ArrayList.Add() so it can be passed in the required System.Object. Also note that the System.Object is unboxed back into a System.Int32 once it is retrieved from the ArrayList using the type indexer, only to be boxed again when it is passed to the Console.WriteLine() method, as this method is operating on System.Object variables.

Boxing and unboxing are convenient from a programmer’s point of view, but this simplified approach to stack/heap memory transfer comes with the baggage of performance issues (in both speed of execution and code size) and a lack of type safety. To understand the performance issues, ponder the steps that must occur to box and unbox a simple integer:

A new object must be allocated on the managed heap.
The value of the stack-based data must be transferred into that memory location.
When unboxed, the value stored on the heap-based object must be transferred back to the stack.
The now unused object on the heap will (eventually) be garbage collected.

Although this particular WorkWithArrayList() method won’t cause a major bottleneck in terms of performance, you could certainly feel the impact if an ArrayList contained thousands of integers that your program manipulates on a somewhat regular basis. In an ideal world, you could manipulate stackbased data in a container without any performance issues. Ideally, it would be nice if you did not have to have to bother plucking data from this container using try/catch scopes (this is exactly what generics let you achieve).

The Issue of Type Safety

You touched on the issue of type safety when you looked at unboxing operations. Recall that you must unbox your data into the same data type it was declared as before boxing. However, there is another aspect of type safety you must keep in mind in a generic-free world: the fact that a majority of the classes of System.Collections can typically hold anything whatsoever because their members are prototyped to operate on System.Objects. For example, this method builds an ArrayList of random bits of unrelated data:

static void ArrayListOfRandomObjects()
{
    // The ArrayList can hold anything at all.
    ArrayList allMyObjects = new ArrayList();
    allMyObjects.Add(true);
    allMyObjects.Add(new OperatingSystem(PlatformID.MacOSX, new Version(10, 0)));
    allMyObjects.Add(66);
    allMyObjects.Add(3.14);
}

In some cases, you will require an extremely flexible container that can hold literally anything (as seen here). However, most of the time you desire a type-safe container that can only operate on a particular type of data point. For example, you might need a container that can only hold database connections, bitmaps, or IPointy-compatible objects.

Prior to generics, the only way you could address this issue of type safety was to create a custom (strongly typed) collection class manually. Assume you wish to create a custom collection that can only contain objects of type Person:

public class Person
{
    public int Age {get; set;}
    public string FirstName {get; set;}
    public string LastName {get; set;}
    public Person(){}
    public Person(string firstName, string lastName, int age)
    {
        Age = age;
        FirstName = firstName;
        LastName = lastName;
    }
    
    public override string ToString()
    {
        return string.Format("Name: {0} {1}, Age: {2}",
            FirstName, LastName, Age);
    }
}

To build a person only collection, you could define a System.Collections.ArrayList member variable within a class named PersonCollection and configure all members to operate on strongly typed Person objects, rather than on System.Object types. Here is a simple example (a production-level custom collection could support many additional members and might extend an abstract base class from the System.Collections namespace):

public class PersonCollection : IEnumerable
{
    private ArrayList arPeople = new ArrayList();

    // Cast for caller.
    public Person GetPerson(int pos)
    { return (Person)arPeople[pos]; }

    // Only insert Person objects.
    public void AddPerson(Person p)
    { arPeople.Add(p); }

    public void ClearPeople()
    { arPeople.Clear(); }

    public int Count
    { get { return arPeople.Count; } }
    
    // Foreach enumeration support.
    IEnumerator IEnumerable.GetEnumerator()
    { return arPeople.GetEnumerator(); }
}

Notice that the PersonCollection class implements the IEnumerable interface, which allows a foreach-like iteration over each contained item. Also notice that your GetPerson() and AddPerson() methods have been prototyped to operate only on Person objects, not bitmaps, strings, database connections, or other items. With these types defined, you are now assured of type safety, given that the C# compiler will be able to determine any attempt to insert an incompatible data type:

static void UsePersonCollection()
{
    Console.WriteLine("***** Custom Person Collection *****\n");
    PersonCollection myPeople = new PersonCollection();
    myPeople.AddPerson(new Person("Homer", "Simpson", 40));
    myPeople.AddPerson(new Person("Marge", "Simpson", 38));
    myPeople.AddPerson(new Person("Lisa", "Simpson", 9));
    myPeople.AddPerson(new Person("Bart", "Simpson", 7));
    myPeople.AddPerson(new Person("Maggie", "Simpson", 2));
    
    // This would be a compile-time error!
    // myPeople.AddPerson(new Car());
    
    foreach (Person p in myPeople)
        Console.WriteLine(p);
}

While custom collections do ensure type safety, this approach leaves you in a position where you must create an (almost identical) custom collection for each unique data type you wish to contain. Thus, if you need a custom collection that can operate only on classes deriving from the Car base class, you need to build a highly similar collection class:

public class CarCollection : IEnumerable
{
    private ArrayList arCars = new ArrayList();

    // Cast for caller.
    public Car GetCar(int pos)
    { return (Car) arCars[pos]; }

    // Only insert Car objects.
    public void AddCar(Car c)
    { arCars.Add(c); }

    public void ClearCars()
    { arCars.Clear(); }

    public int Count
    { get { return arCars.Count; } }

    // Foreach enumeration support.
    IEnumerator IEnumerable.GetEnumerator()
    { return arCars.GetEnumerator(); }
}

However, a custom collection class does nothing to solve the issue of boxing/unboxing penalties. Even if you were to create a custom collection named IntCollection that you designed to operate only on System.Int32 items, you would have to allocate some type of object to hold the data (e.g., System.Array and ArrayList):

public class IntCollection : IEnumerable
{
    private ArrayList arInts = new ArrayList();

    // Unbox for caller.
    public int GetInt(int pos)
    { return (int)arInts[pos]; }

    // Boxing operation!
    public void AddInt(int i)
    { arInts.Add(i); }

    public void ClearInts()
    { arInts.Clear(); }

    public int Count
    { get { return arInts.Count; } }

    IEnumerator IEnumerable.GetEnumerator()
    { return arInts.GetEnumerator(); }
}

Regardless of which type you might choose to hold the integers, you cannot escape the boxing dilemma using non-generic containers.

When you use generic collection classes, you rectify all of the previous issues, including boxing/unboxing penalties and a lack of type safety. Also, the need to build a custom (generic) collection class manually becomes quite rare. Rather than having to build unique classes that can contain people, cars, and integers, you can use a generic collection class and specify the type of type. Consider the following method, which uses a generic List<> class (in the System.Collections.Generic namespace) to contain various types of data in a strongly typed manner (don’t fret the details of generic syntax at this time):

static void UseGenericList()
{
    Console.WriteLine("***** Fun with Generics *****\n");

    // This List<> can only hold Person objects.
    List<Person> morePeople = new List<Person>();
    morePeople.Add(new Person ("Frank", "Black", 50));
    Console.WriteLine(morePeople[0]);

    // This List<> can only hold integers.
    List<int> moreInts = new List<int>();
    moreInts.Add(10);
    moreInts.Add(2);
    int sum = moreInts[0] + moreInts[1];

    // Compile-time error! Can't add Person object
    // to a list of ints!
    // moreInts.Add(new Person());
}

The first List<> object can only contain Person objects. Therefore, you do not need to perform a cast when plucking the items from the container, which makes this approach more type safe. The second List<> can only contain integers, all of which are allocated on the stack; in other words, there is no hidden boxing or unboxing as you found with the non-generic ArrayList.

Here is a short list of the benefits generic containers provide over their non-generic counterparts:

Generics provide better performance because they do not result in boxing or unboxing penalties.
Generics are more type safe because they can only contain the type of type you specify.
Generics greatly reduce the need to build custom collection types because the base class library provides several prefabricated containers.

Source Code You can find the IssuesWithNonGenericCollections project under the Chapter 10 directory.